Understanding Emotion in Hemingway’s and Fitzgerald’s work

Introduction and Research Question

This project is a computational analysis of emotion in selected twentieth century literary texts. Because of the subjective nature of emotion, it is typically analysed through close reading only. But the use of a data-driven method forces us to think about different questions. What are the ways in which emotions are written? How do they differ across authors, subjects, and times? Which characters, genres and narratives are they most associated with? How does one quantify emotion? Is it possible to quantify it at all, and how does that change the focus of literary studies and stylistic analysis?

We will briefly explore some of these questions in our analysis of F. Scott Fitzgerald and Ernest Hemingway. We aim to use data to find out which author’s work is more emotional, and why? How do these emotions manifest in and around their protagonists?

Hypothesis

The novels of Fitzgerald will have more emotion than the novels of Hemingway.

Corpus Description

We undertook a comparative study of Ernest Hemingway and F. Scott Fitzgerald’s novels. The corpus was created based on availability. We had to access Hemingway’s works through Libgen, and very few plaintext files were as clean as we wanted them to be. Several files had non-text characters appearing, which would have impacted our analysis. Our initial plan was to get four novels for both authors, since Fitzgerald also has only 4 finished novels. But given the short length of The Old Man and the Sea, we chose to add Garden of Eden to our analysis too, making the total five novels for Hemingway.

Despite this, we believe the comparison will yield interesting results for two reasons. Firstly, both authors wrote in the early 1900s, during what was called the “Jazz Age”, but in very different styles and about very different subject matter. Fitzgerald’s world held soirees, idle wealth, and hedonism; while Hemingway’s held grit, frugality, and violence. Their writing styles are also quite distinct, with Hemingway’s work being characterized by its short, clipped sentences, with few adverbs, and Fitzgerald’s with more verbose, descriptive writing. What they do have in common is a picture of post-war industrialist America, wrought with disillusionment but slowly rebuilding.

In such a dynamic setting, we thought it worthwhile to do a reading of emotion in the works of both authors. Both literature and affect do, after all, lie at the intersection of the individual and the social. Our methodology then becomes two-pronged. One one hand, we conduct a stylistic study of the writing of emotions; and on the other, we conduct a contextual analysis of how characters in various social worlds feel and interact with these emotions. Our texts for each author are as follows:

Hemingway Fitzgerald
The Old Man and the Sea This Side of Paradise
A Farewell to Arms The Great Gatsby
For Whom the Bell Tolls Tender is the Night
The Sun Also Rises The Beautiful and Damned
The Garden of Eden

Summary

As mentioned above, we used four texts for Hemingway and five for Fitzgerald to analyse the presence of emotion in their texts. The corpus is close to 800000 words long, which makes it shorter than a safe 1 million cut off, however given our limitations because of our desire to stick to novels by Fitzgerald instead of short stories, our corpus length was limited.

The length of the novels by the authors varied considerably, with two shorter novels by both authors included - The Old Man and the Sea for Hemingway at close to 28000 words, and The Great Gatsby at 48700 words, as well as longer novels.

Fitzgerald’s novels have a higher vocabulary density (ratio of words in the novel to unique words in the novel), and sentence length as compared to Hemingway’s. For Fitzgerald’s novels, the vocabulary density ranges from 0.097 to 0.122 while for Hemingway’s novels this value ranges from 0.049 to 0.099. This indicates the difference in their writing styles and choice of unique words, fitting with our intuition about Fitzgerald’s work being more descriptive and using more complex, verbose language. This also applies to the sentence length, with Fitzgerald’s average words per sentence ranging from 14.7 to 15.9, while for Hemingway this value ranged from 8.8 to 14.4. This seems reflective of the differences in their writing styles.

With regard to unique terms, in all novels except for The Old Man and the Sea, the unique terms are character names. Interestingly The Old Man and the Sea is the exception to this trend, probably because of the limited human characters in the book, with terms like dolphin and shark standing out as distinctive words.

Thus, our corpus summary seems to indicate that our intuition about the differences between Hemingway’s and Fitzgerald’s writing style seems accurate, and our subsequent sections will focus on how emotions get expressed in these differing writing styles.

Data Visualization 1: Sentiment Analysis/Relative Frequency

This will determine which author is more emotional. In qualitative reading of literature, it is easy to question what it means for one text to be more emotional than another. One may argue that even simple descriptive paragraphs have emotion attached to them. But it is important to note that our tools will only tell us which author’s work is more overtly emotional: that is, which author’s work will let us decipher emotion without having to analyse subtext or metaphors. Close reading may or may not corroborate our findings, and lack of emotion in our findings definitely does not mean that an author is emotionless.

To answer our question, we created a lexicon with custom words to represent various emotions. These include basic, broad ones like joy and anger, but also more specific ones such as pride and resentment. A function also allows us to pick up on some variations of form in these words. So while our original lexicon has the word “happy”, the function will also include “happiness”, “happier”, “happiest”, etc. by looking for the prefix “happi” and cleaning out all irrelevant results. However, since this lexicon has been manually compiled, there is always the danger of missing out on possible variations.

This gave us the relative percentage of emotion words in each author’s work, as a proportion of all words used in the text.

As the graph indicates, Fitzgerald scores much higher on the occurrences of emotion words. His least emotional novel, The Great Gatsby, with 2.5%, is still higher than Hemingway’s most emotional novel, A Farewell To Arms, which is around 2.1%. The Old Man and The Sea has the lowest score of the corpus.

The reason for Hemingway’s scores is largely stylistic. Because a lot of emotion is directly located in adverbs, a good percentage of our lexicon consists of adverbs, which are exactly what Hemingway’s writing lacks. A lot of Hemingway’s characters were also hypermasculine and shied away from being upfront about their feelings. In fact, if one were to read closely, it is by means of this stoic masculine facade that their true feelings can be deciphered. Our tools are not equipped to do this. They look at words rather than sentences or paragraphs, so they lack any reading whatsoever of subtext or context. They cannot pick up on analogies, or metaphors, or the manner in which authors choose to express characters’ emotions through their expressions, or reactions.

Fitzgerald’s relatively higher percentage is reflective of his writing style, and the larger themes in his novels. His writing style is more descriptive, and allows for more space to explore various emotions. His novels also focus on the emotional bonds between characters, especially within the context of romantic relationships, which is a likely reason for the higher score too. Several of his male main characters have romantic relationships which provide a space for the important themes in the book to emerge through, making it important for the writing and descriptions of the same to be detailed.

Data Visualization 2: Specific Emotions in Authors

For our next step, we used the NRC emotion lexicon to determine which emotions were highest. This would ideally give us an insight into the characters’ social worlds, as well as the thematic foci of the authors.

To avoid an overcrowded graph, we picked four emotions from the positive end of the spectrum (trust, anticipation, surprise, and joy – although surprise can go both ways) and four from the negative end (anger, sadness, fear, and disgust). Unlike our previous lexicon, this lexicon was pre-made and external, and it could have missed out on certain terms.

Here, Hemingway scores significantly higher on anger and fear, and marginally higher on disgust, trust and sadness. All four of the “negative” emotions are higher in his work. Intuitively, this makes sense for anger and fear, because he writes characters with lots of machismo and puts them in demanding settings like war or at the mercy of nature. A Farewell To Arms features a protagonist who suffers the pain of war and is disillusioned after it; while The Sun Also Rises speaks of how the protagonists’ experiences at war cripple his romances as well. We can justify his usage of trust too, because people in these settings often develop trust just by virtue of surviving their hardships together (like in The Old Man and The Sea).

What’s interesting (and not intuitive) is that Fitzgerald has barely any words that express disgust. One would assume that it is such an easy emotion for snobbish upper class people to feel – especially when they are continually in unhappy love affairs. The settings within which there can be feelings of disgust emerging are not less- there are characters who undergo a “fall”, and resort to alcoholism, debauchery, and self-pity due to failures in relationships, or characters who cheat and betray their significant others. Despite this, there is little to no disgust, even though there are several situations which would provide ample space for the expression of disgust. Perhaps these results are due to us conflating disgust and shame, or another related emotion that may show up in future tests.

Data Visualization 3: Named Entity Recognition

While NRC gives us a powerful picture of which emotions dominate an author’s work, we wanted to further inquire into where and who they come from. Hence, we combined this with Named Entity Recognition (NER) and sentiment analysis to map the emotions surrounding the characters in each novel. For the protagonists, we retained the emotion lexicon, so we could pinpoint specific emotions; and for the other characters, we simply generated a positive or negative sentiment score.

Who the protagonist is in a story can be unclear, but we’ve used the safest assumption based on research. This analysis would be more difficult with a larger corpus because of how time consuming the cleaning process for NER is. The amount we learn about each protagonist varies, with some novels providing great insights into their emotions, and some such as The Great Gatsby, emphasising on the same less.

Hemingway’s protagonists seem to consistently have higher scores for trust, as compared to other emotions. Interestingly, despite his protagonists being presented as extremely “masucline” characters, thrust in situations of wars, or grappling with nature, they score higher on trust than they do on anger. No characters are particularly angry: instead, sadness and trust, and in one case fear too, stand out more. This brings up an interesting point about these characters and the manner in which Hemingway thought about masculinity in the post war era he was writing in. Trust is an emotion that is relational - it emerges within the context of relationships, or a cause, and is thus an indicator of the same in novels. As demonstrated by characters such as Henry from A Farewell to Arms, the emphasis on trust could be a product of the relationships, and the emphasis on men `finding themselves’ and in the case of the novel, bidding goodbye to war to embrace a life of love instead. Thus, the comparatively high scores for trust could be a product of the relationships exhibited in Hemingway’s novels. Hemingway’s choice to make his main characters masculine, but also not intensely angry allows him to create characters who experience complex emotional conflict, and thus, cannot be typecast as one dimensional characters.

However, despite this, Hemingway’s characters are angrier than Fitzgerald’s (except for Dick Diver), which fits with our assumptions about them. It is possible that Hemingway may not have had very angry characters in absolute terms, but as compared to other authors of his time, he may have written people that are angrier than the average character. This is not accounted for in our analysis.

Fitzgerald’s protagonists score high on trust and anticipation as compared to other emotions. This is likely because of the emphasis on romantic relationships in his books, and the way they present these relationships as causing the downfall of the male protagonist - emphasising on the presence (or absence) of trust in these relationships. The high score for anticipation could be a product of two factors - first, the male character’s yearning for a particular female character, and second, the build up to the protagonist’s downfall. The main characters longing for a female character can be seen in the The Great Gatsby, where Gatsby would hold parties in anticipation of Daisy coming to them. Similarly, the build up to the characters’ downfall can be visible in Tender is the Night, where the reader is witness to Dick Diver’s slow fall from grace.

His protagonists also feel more joy than sadness (barring Dick Diver), and this stands out given our initial assumption that Fitzegerald’s novels would have more melancholic characters. However, apart from Gatsby, none of them particularly feel surprise, which is, if we may, quite surprising. Partly because the protagonist of any story typically has unforeseen events happen to them, and thus is surprised to at least some extent; and partly because a lot of Fitzgerald’s protagonists go through similar situations as Gatsby, in terms of being abandoned by a lover. Perhaps it is Gatsby’s murder that skews the results, or perhaps the other protagonists are simply not surprised by these events, and instead anticipated them, given the reality of their lives.

Our analysis also includes a list of the most positive and negative characters in each novel, created through sentiment analysis that generated a score for each of them. Scores on the positive end mean the character had positive emotions associated with them, while negative scores mean the opposite. This provides insights into how emotion and gender was linked. Fitzgerald has more negative women than positive women. This could be reflective of how his women influenced the protagonists’ lives, such as in the case of The Great Gatsby and Tender is the Night, where they’re presented as leading to the protagonist’s downfall. He presents these romantic relationships as being all consuming, with the woman either cheating on the protagonist, or even indirectly causing his death. Some of Fitzgerald’s female characters, such as Nicole Diver are also presented with a history of mental illness, contributing to the assignment of negative sentiments to them.

Hemingway on the other hand, has more positive women. This could be reflective of the idealised image of women his protagonists have, where they are put on moral pedestals that also become cages. Some women of Hemingway were incredibly submissive and obedient, which he saw as a thing of great value. One such character is Catherine from A Farewell to Arms – but interestingly, she does not appear in the list of most positive characters from the novel. The two women that do appear are her colleagues, who are secondary to the main romance. This may have been due to the lexicon classifying a lot of what Catherine did as neutral, rather than positive or negative. It is always an error that can exist with characters and dialogue in sentiment analysis.

Conclusion

The research proves our hypothesis, and goes on to further complicate it. Our hypothesis was proven correct, as Fitzgerald does have more emotion than Hemingway, as data has shown. He uses a higher percentage of emotion words, and as we assumed, this seems to be a product of their respective writing styles. However, there are several caveats to the same as we use a very narrow view of emotion with our emotion lexicons.

As our analysis reveals, Hemingway is not emotionless by any means. Instead, his writing style is the primary reason because of which he appears to express less emotion in his writing. Thus, our hypothesis assessed emotion through a narrow lens and a better way of understanding it would be that Fitzgerald wrote with more easily identifiable emotion words than Hemingway, rather than assuming Hemingway’s characters lacked emotional depth. . They also helped to counter some initial fleeting assumptions we made about the corpus, like Fitzgerald’s characters being sadder or Hemingway’s characters being one-dimensionally angry, and allowed for a more nuanced understanding of their characters.

Reflection

R provided us with tools to view the texts and an author’s work at a macro level, and allowed us to deal with a large quantity of data in a short period of time. This was useful as it presented trends across novels for different authors. The visualisations also allowed for the easy analysis and presentation of findings. Given that the main rule of writing is to show, not tell readers how a character feels, one may wonder if these are worthwhile tools to explore emotion at all, considering they only pick up on what is explicitly told to them. But they are very powerful tools to analyse writing style as well as emotion in relation to other variables in the text, like gender or character.

The activity involved moving away from a typical close reading of literature, which presented its own challenges. A significant issue revolved around how emotion lexicons and sentiment scores are not useful for picking up on subtext. It cannot measure emotions expressed through metaphors, details of non verbal expressions in books and some dialogues. It is more useful at identifying more overt methods of expressing emotion. Along with this, it is difficult for even the most elaborate emotional lexicons to be entirely comprehensive, indicating the possibility that several expressions of emotion could have been missed.

Another issue with our analysis was related to how we generated an overall emotion score, instead of understanding sentiment around a concept (such as sentiment around war in novels). This proved to be slightly broad, as compared to drilling down on a topic, however, it was appropriate for our hypothesis which sought to explore emotion in their writing styles.